In this project, we are trying to classify different brain tumours from MRI images. A brain tumour is a collection or mass of abnormal cells that are located in a person's brain. Tumours grow and press into the skull, causing an increase in internal pressure and can impair brain functionality. Naturally, we observed that CNNs can learn features from these MRI images and allow for efficient classification. Our dataset had 4 different classes: glioma, meningioma, no tumour, and pituitary.
Our end-to-end pipeline starts with the data analysis and pre-processing. Firstly, we checked if the dataset was balanced. Once that was confirmed, the 512x512 RGB input images were resized to 128x128 and were randomly given transforms such as a horizontal flip, a small rotation up to 30 degrees, and changes to the contrast and brightness. We then split the dataset approximately 60/20/20 for the training, validation, and testing, respectively.
For our models, we used a homemade CNN as well as the VGG-16 model. We fully trained our 3-layer homemade CNN while we used transfer learning with the VGG-16 by creating and training a custom final layer in the NN classifier. This was needed as the original net had 1000 output classes while ours only has 4.
We trained each model with 8 different sets of hyperparameters: batch sizes of 128 and 256, number of epochs of 25 and 50, and learning rates of 0.01 and 0.005. Once we ran the 8 different sets for each model, the best one was obtained based on the accuracy charts and confusion matrices that were produced. The best model was a VGG Transfer Learning model with a batch size of 256, learning rate of 0.005, and 25 epochs. It was observed that for more epochs, the model would start to overfit.
Using the best model, we tested it against the training dataset as well as brand new data from an additional data source. On the new dataset, the best model performed significantly better, with 48% accuracy, as compared to the homemade CNN model with 36% accuracy. Lastly, we discussed our challenges, lessons learned as well as future recommendations.
https://colab.research.google.com/drive/1rhrWZj5V9Caa12AoXrT1ZJkIt_-hYPnn?usp=sharing
Training and Validation:
https://drive.google.com/drive/folders/1hTnjlloONqFcUDtdCtedT9kEqY82a_wh?usp=share_link
'New Data':
https://github.com/sartajbhuvaji/brain-tumor-classification-dataset
!pip install torchmetrics
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting torchmetrics
Downloading torchmetrics-0.11.4-py3-none-any.whl (519 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 519.2/519.2 KB 9.4 MB/s eta 0:00:00
Requirement already satisfied: torch>=1.8.1 in /usr/local/lib/python3.9/dist-packages (from torchmetrics) (2.0.0+cu118)
Requirement already satisfied: numpy>=1.17.2 in /usr/local/lib/python3.9/dist-packages (from torchmetrics) (1.22.4)
Requirement already satisfied: packaging in /usr/local/lib/python3.9/dist-packages (from torchmetrics) (23.0)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.9/dist-packages (from torch>=1.8.1->torchmetrics) (4.5.0)
Requirement already satisfied: triton==2.0.0 in /usr/local/lib/python3.9/dist-packages (from torch>=1.8.1->torchmetrics) (2.0.0)
Requirement already satisfied: filelock in /usr/local/lib/python3.9/dist-packages (from torch>=1.8.1->torchmetrics) (3.10.7)
Requirement already satisfied: networkx in /usr/local/lib/python3.9/dist-packages (from torch>=1.8.1->torchmetrics) (3.0)
Requirement already satisfied: sympy in /usr/local/lib/python3.9/dist-packages (from torch>=1.8.1->torchmetrics) (1.11.1)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.9/dist-packages (from torch>=1.8.1->torchmetrics) (3.1.2)
Requirement already satisfied: cmake in /usr/local/lib/python3.9/dist-packages (from triton==2.0.0->torch>=1.8.1->torchmetrics) (3.25.2)
Requirement already satisfied: lit in /usr/local/lib/python3.9/dist-packages (from triton==2.0.0->torch>=1.8.1->torchmetrics) (16.0.0)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.9/dist-packages (from jinja2->torch>=1.8.1->torchmetrics) (2.1.2)
Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.9/dist-packages (from sympy->torch>=1.8.1->torchmetrics) (1.3.0)
Installing collected packages: torchmetrics
Successfully installed torchmetrics-0.11.4
!pip install torchinfo
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/ Collecting torchinfo Downloading torchinfo-1.7.2-py3-none-any.whl (22 kB) Installing collected packages: torchinfo Successfully installed torchinfo-1.7.2
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision
from sklearn import metrics
from sklearn.model_selection import train_test_split
import math
import torchmetrics
from torchinfo import summary
from torch.utils.data import DataLoader, Subset
from torchvision import datasets,transforms, models
from pathlib import Path
from PIL import Image, ImageEnhance
from google.colab import drive
drive.mount('/content/drive', force_remount=True)
Mounted at /content/drive
device = "cuda" if torch.cuda.is_available() else 'cpu'
print(device)
cuda
# Create the file path to the training/validation and testing datasets
path = Path("/content/drive/MyDrive/Colab Notebooks/MIE1517/MIE1517/archive")
dem_path = '/content/drive/MyDrive/Colab Notebooks/MIE1517/MIE1517/Brain-Tumor-Classification-DataSet-master/Testing'
train_path = path /'Training'
test_path = path/ 'Testing'
torch.manual_seed(0)
<torch._C.Generator at 0x7f95ac844850>
# Create the ImageFolders for the training/validation and testing dataset
train_data = datasets.ImageFolder(root=train_path)
test_data = datasets.ImageFolder(root=test_path)
#Check the size of the datasets
print('Training size: ', len(train_data))
print('Testing size: ', len(test_data))
Training size: 5742 Testing size: 1311
#Identify the classes and associated labels
train_data.class_to_idx
{'glioma': 0, 'meningioma': 1, 'notumor': 2, 'pituitary': 3}
# Check to see if training dataset is balanced
class_labels = train_data.class_to_idx
class_names = train_data.classes
num_classes = len(train_data.classes)
num_images = np.zeros(num_classes)
for i in train_data.samples:
num_images[i[1]] += 1
plt.pie(num_images, labels=train_data.classes, autopct='%.1f%%')
plt.title('Class Distribution - Train Dataset')
plt.show()
#View an image
img, label = next(iter(train_data))
img
# Check size and colour mode of the images
img.getpixel
<bound method Image.getpixel of <PIL.Image.Image image mode=RGB size=512x512 at 0x7F958360F040>>
# Define the re-sized image size
image_size = 128
'''
Resize: Resize images to 128x128
RandomHorizontalFlip: Flipping along the horizontal axis swaps the left
hemisphere with the right one, and vice versa. This
operation can help various deep classifiers, especially
those benefitting from the contextual tumor information,
be invariant with respect to their position (i.e. tumour
on left side vs. right side)
RandomRotation: Randomly rotate training image up to 30 degrees
ColorJitter: Randomly change brightness and contrast of images for given range
'''
train_transforms = transforms.Compose([
transforms.Resize(size= (image_size,image_size)),
transforms.RandomHorizontalFlip(p=0.5),
transforms.RandomRotation(30),
transforms.ColorJitter(brightness=(0.5,1.5),contrast=(0.5,1.5)),
transforms.ToTensor()
])
test_transforms = transforms.Compose([
transforms.Resize(size= (image_size,image_size)),
transforms.ToTensor()
])
# Training, Validation and Testing dataset for the original dataset
full_train_data = datasets.ImageFolder(root=train_path,
transform = train_transforms,
target_transform = None)
test_data = datasets.ImageFolder(root=test_path, transform = test_transforms)
# Dataset for the "new data"
test_dem = datasets.ImageFolder(root=dem_path, transform = test_transforms)
# Split full_train_data into training and validation datasets (80/20 ratio)
train_idx, val_idx = train_test_split(list(range(len(full_train_data))),
test_size=0.2, shuffle= True)
train_data = Subset(full_train_data, train_idx)
val_data = Subset(full_train_data, val_idx)
len(train_data), len(val_data), len(test_data)
(4593, 1149, 1311)
#Load images in batches
batch_size = 64
train_dataloader = DataLoader(train_data, batch_size = batch_size, shuffle=True)
val_dataloader = DataLoader(val_data, batch_size = batch_size, shuffle=True)
test_dataloader = DataLoader(test_data, batch_size = batch_size, shuffle = False)
# Brief visualization of the transformed images from the training dataset
for i in range(8):
plt.subplot(2, 4, i+1)
plt.axis('off')
img = train_data[i*2]
plt.imshow(img[0].permute(1,2,0))
plt.title(class_names[img[1]])
# Download the VGG-16 model with the pre-trained weights
vgg_tf_model = models.vgg16(weights='DEFAULT')
vgg_tf_model.name = 'VGG16'
Downloading: "https://download.pytorch.org/models/vgg16-397923af.pth" to /root/.cache/torch/hub/checkpoints/vgg16-397923af.pth 100%|██████████| 528M/528M [00:01<00:00, 279MB/s]
# Feature Extractor (Convolutional Neural Network)
vgg_tf_model.features
Sequential( (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): ReLU(inplace=True) (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (3): ReLU(inplace=True) (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (6): ReLU(inplace=True) (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (8): ReLU(inplace=True) (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (11): ReLU(inplace=True) (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (13): ReLU(inplace=True) (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (15): ReLU(inplace=True) (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (18): ReLU(inplace=True) (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (20): ReLU(inplace=True) (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (22): ReLU(inplace=True) (23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (25): ReLU(inplace=True) (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (27): ReLU(inplace=True) (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (29): ReLU(inplace=True) (30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) )
# Classifier (Linear Neural Network)
vgg_tf_model.classifier
Sequential( (0): Linear(in_features=25088, out_features=4096, bias=True) (1): ReLU(inplace=True) (2): Dropout(p=0.5, inplace=False) (3): Linear(in_features=4096, out_features=4096, bias=True) (4): ReLU(inplace=True) (5): Dropout(p=0.5, inplace=False) (6): Linear(in_features=4096, out_features=1000, bias=True) )
# Make sure all the existing layers in vgg_tf_model are non-trainable
for param in vgg_tf_model.parameters():
param.requires_grad = False
# Split the feature extractor and classifier into different models
vgg_tf_features = vgg_tf_model.features
vgg_tf_features.name = 'VGG16_Features'
vgg_tf_classifier = vgg_tf_model.classifier
vgg_tf_classifier.name = 'VGG16_Classifier'
# Get the number of inputs for the final classifier layer as we are replacing it
vgg_tf_class_num_imputs = vgg_tf_classifier[-1].in_features
# Create a new final classifier layer to have 4 output classes
vgg_tf_classifier[-1] = nn.Sequential(
nn.Linear(vgg_tf_class_num_imputs, 256), nn.ReLU(), nn.Dropout(0.5),
nn.Linear(256, num_classes))
# Check the new classifier
vgg_tf_classifier
Sequential(
(0): Linear(in_features=25088, out_features=4096, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.5, inplace=False)
(3): Linear(in_features=4096, out_features=4096, bias=True)
(4): ReLU(inplace=True)
(5): Dropout(p=0.5, inplace=False)
(6): Sequential(
(0): Linear(in_features=4096, out_features=256, bias=True)
(1): ReLU()
(2): Dropout(p=0.5, inplace=False)
(3): Linear(in_features=256, out_features=4, bias=True)
)
)
# Function to help check the number of total and trainable parameters in a model
def get_parameters(model):
name = model.name
total_params = train_params = 0
for p in model.parameters():
total_params += p.numel()
if p.requires_grad:
train_params += p.numel()
print(f'{model.name} has {total_params:,} total parameters and {train_params:,} trainable parameters.')
get_parameters(vgg_tf_classifier)
VGG16_Classifier has 120,595,716 total parameters and 1,049,860 trainable parameters.
# Check the final VGG-16 model (should be the same as the diagrams above)
vgg_tf_model
VGG(
(features): Sequential(
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
(2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU(inplace=True)
(4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(6): ReLU(inplace=True)
(7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(8): ReLU(inplace=True)
(9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(11): ReLU(inplace=True)
(12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(13): ReLU(inplace=True)
(14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(15): ReLU(inplace=True)
(16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(18): ReLU(inplace=True)
(19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(20): ReLU(inplace=True)
(21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(22): ReLU(inplace=True)
(23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(25): ReLU(inplace=True)
(26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(27): ReLU(inplace=True)
(28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(29): ReLU(inplace=True)
(30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
(classifier): Sequential(
(0): Linear(in_features=25088, out_features=4096, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.5, inplace=False)
(3): Linear(in_features=4096, out_features=4096, bias=True)
(4): ReLU(inplace=True)
(5): Dropout(p=0.5, inplace=False)
(6): Sequential(
(0): Linear(in_features=4096, out_features=256, bias=True)
(1): ReLU()
(2): Dropout(p=0.5, inplace=False)
(3): Linear(in_features=256, out_features=4, bias=True)
)
)
)
# Extract the latent space of the model derived from the training dataset
vgg_tf_features = []
vgg_tf_labels = []
batch = 0
for images, targets in train_dataloader:
print(batch, end='')
batch += 1
# Extract features using the VGG-16 network
outputs = vgg_tf_model.features(images)
outputs = vgg_tf_model.avgpool(outputs)
# Flatten the features to a 1D tensor
outputs = torch.flatten(outputs, start_dim=1)
# Add the features and labels to the list
vgg_tf_features.append(outputs)
vgg_tf_labels.append(targets)
features = torch.cat(vgg_tf_features)
labels = torch.cat(vgg_tf_labels)
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071
# 1st dimension of features and labels should equal the size of the training set
features.shape, labels.shape
(torch.Size([4593, 25088]), torch.Size([4593]))
# Concatenate the features and labels into a single dataset
features_concat = torch.tensor([])
stacked = torch.utils.data.TensorDataset(features, labels)
features_concat = torch.utils.data.ConcatDataset([features_concat,stacked])
# Check to see if features feeds correctly into the VGG-16 classifier
summary(vgg_tf_classifier, input_data=features)
========================================================================================== Layer (type:depth-idx) Output Shape Param # ========================================================================================== Sequential [1149, 4] -- ├─Linear: 1-1 [1149, 4096] 33,558,528 ├─ReLU: 1-2 [1149, 4096] -- ├─Dropout: 1-3 [1149, 4096] -- ├─Linear: 1-4 [1149, 4096] (16,781,312) ├─ReLU: 1-5 [1149, 4096] -- ├─Dropout: 1-6 [1149, 4096] -- ├─Sequential: 1-7 [1149, 4] -- │ └─Linear: 2-1 [1149, 256] 1,048,832 │ └─ReLU: 2-2 [1149, 256] -- │ └─Dropout: 2-3 [1149, 256] -- │ └─Linear: 2-4 [1149, 4] 1,028 ========================================================================================== Total params: 51,389,700 Trainable params: 34,608,388 Non-trainable params: 16,781,312 Total mult-adds (G): 59.05 ========================================================================================== Input size (MB): 37.65 Forward/backward pass size (MB): 77.69 Params size (MB): 205.56 Estimated Total Size (MB): 320.90 ==========================================================================================
# Extract the latent space of the model derived from the validation dataset
valid_vgg_tf_features = []
valid_vgg_tf_labels = []
batch = 0
for images, targets in val_dataloader:
print(batch, end='')
batch += 1
# Extract features using the VGG-16 network
valid_outputs = vgg_tf_model.features(images)
valid_outputs = vgg_tf_model.avgpool(valid_outputs)
# Flatten the features to a 1D tensor
valid_outputs = torch.flatten(valid_outputs, start_dim=1)
# Add the features and labels to the list
valid_vgg_tf_features.append(valid_outputs)
valid_vgg_tf_labels.append(targets)
valid_features = torch.cat(valid_vgg_tf_features)
valid_labels = torch.cat(valid_vgg_tf_labels)
01234567891011121314151617
valid_features.shape, valid_labels.shape
(torch.Size([1149, 25088]), torch.Size([1149]))
valid_features_concat = torch.tensor([])
valid_stacked = torch.utils.data.TensorDataset(valid_features, valid_labels)
valid_features_concat = torch.utils.data.ConcatDataset([valid_features_concat,valid_stacked])
# Extract the latent space of the model derived from the test dataset
test_vgg_tf_features = []
test_vgg_tf_labels = []
batch = 0
for images, targets in test_dataloader:
print(batch, end='')
batch += 1
# Extract features using the VGG-16 network
outputs = vgg_tf_model.features(images)
outputs = vgg_tf_model.avgpool(outputs)
# Flatten the features to a 1D tensor
outputs = torch.flatten(outputs, start_dim=1)
# Add the features and labels to the list
test_vgg_tf_features.append(outputs)
test_vgg_tf_labels.append(targets)
test_features = torch.cat(test_vgg_tf_features)
test_labels = torch.cat(test_vgg_tf_labels)
01234567891011121314151617181920
test_features.shape, test_labels.shape
(torch.Size([1311, 25088]), torch.Size([1311]))
test_features_concat = torch.tensor([])
test_stacked = torch.utils.data.TensorDataset(test_features, test_labels)
test_features_concat = torch.utils.data.ConcatDataset([test_features_concat, test_stacked])
class HomemadeCNN(nn.Module):
def __init__(self, dropout_rate=0.4, num_filters=25, batch_size=64):
super(HomemadeCNN, self).__init__()
self.name = "Three_layer_CNN"
self.dropout_rate = dropout_rate
self.num_filters = num_filters
self.batch_size = batch_size
self.conv1 = nn.Conv2d(3, 5, 5)
self.pool = nn.MaxPool2d(3, 3)
self.conv2 = nn.Conv2d(5, 10, 3)
self.conv3 = nn.Conv2d(10, self.num_filters, 3)
self.dropout = nn.Dropout(dropout_rate)
self.flat = nn.Flatten()
self.fc1 = nn.Linear(self.num_filters * 9, 64)
self.fc2 = nn.Linear(64, 4)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = self.pool(F.relu(self.conv3(x)))
x = self.flat(x)
x = self.dropout(x)
x = F.relu(self.fc1(x))
x = self.fc2(x)
return x
hm_model = HomemadeCNN()
hm_model
HomemadeCNN( (conv1): Conv2d(3, 5, kernel_size=(5, 5), stride=(1, 1)) (pool): MaxPool2d(kernel_size=3, stride=3, padding=0, dilation=1, ceil_mode=False) (conv2): Conv2d(5, 10, kernel_size=(3, 3), stride=(1, 1)) (conv3): Conv2d(10, 25, kernel_size=(3, 3), stride=(1, 1)) (dropout): Dropout(p=0.4, inplace=False) (flat): Flatten(start_dim=1, end_dim=-1) (fc1): Linear(in_features=225, out_features=64, bias=True) (fc2): Linear(in_features=64, out_features=4, bias=True) )
summary(hm_model, input_size=[64, 3, 128, 128])
========================================================================================== Layer (type:depth-idx) Output Shape Param # ========================================================================================== HomemadeCNN [64, 4] -- ├─Conv2d: 1-1 [64, 5, 124, 124] 380 ├─MaxPool2d: 1-2 [64, 5, 41, 41] -- ├─Conv2d: 1-3 [64, 10, 39, 39] 460 ├─MaxPool2d: 1-4 [64, 10, 13, 13] -- ├─Conv2d: 1-5 [64, 25, 11, 11] 2,275 ├─MaxPool2d: 1-6 [64, 25, 3, 3] -- ├─Flatten: 1-7 [64, 225] -- ├─Dropout: 1-8 [64, 225] -- ├─Linear: 1-9 [64, 64] 14,464 ├─Linear: 1-10 [64, 4] 260 ========================================================================================== Total params: 17,839 Trainable params: 17,839 Non-trainable params: 0 Total mult-adds (M): 437.28 ========================================================================================== Input size (MB): 12.58 Forward/backward pass size (MB): 48.73 Params size (MB): 0.07 Estimated Total Size (MB): 61.39 ==========================================================================================
get_parameters(hm_model)
Three_layer_CNN has 17,839 trainable parameters and 17,839 total parameters.
# Takes in a VGG-16 model and creates a new final classifier layer
def make_tf_classifier(model):
classifier = model.classifier
classifier.name = model.name + '_Classifier'
for param in classifier:
param.requires_grad = False
last_layer_inputs = classifier[-1].in_features
classifier[-1] = nn.Sequential(
nn.Linear(last_layer_inputs, 256), nn.ReLU(), nn.Dropout(0.5),
nn.Linear(256, num_classes))
return classifier
# Takes a VGG-16 model and dataset, and calculates the model's accuracy
def get_vgg_accuracy(model, data, bs):
loader = torch.utils.data.DataLoader(data, batch_size=bs, shuffle=True)
correct = 0
total = 0
for imgs, labels in loader:
if torch.cuda.is_available():
imgs = imgs.to(device)
labels = labels.to(device)
output = model(imgs)
softmax = nn.Softmax(dim=1)
output = softmax(output)
correct += (torch.argmax(output, dim=1) == labels).float().sum()
total += imgs.shape[0]
return (100*correct / total).cpu()
# Train a VGG-16 model with different hyperparameters
def train_vgg_model(model, num_epochs, batch_size = 128, lr = 1e-2, optim_adam = True, use_cuda=True):
if use_cuda and torch.cuda.is_available():
model.to(device)
criterion = nn.CrossEntropyLoss()
if optim_adam:
optimizer = optim.Adam(model.parameters() , lr = lr)
else:
optimizer = optim.SGD(model.parameters(), lr=lr, momentum=0.9, weight_decay=1e-3)
train_err = np.zeros(num_epochs)
train_losses = np.zeros(num_epochs)
train_accs = np.zeros(num_epochs)
val_err = np.zeros(num_epochs)
val_loss = np.zeros(num_epochs)
val_accs = np.zeros(num_epochs)
model.train()
for epoch in range(num_epochs):
train_loss = 0
train_acc = 0
val_acc = 0
total_epochs = 0
labels = []
predictions = []
for X, label in torch.utils.data.DataLoader(features_concat, batch_size=batch_size, shuffle=True):
X, label = X.to(device), label.to(device)
optimizer.zero_grad()
y_pred = model(X)
loss = criterion(y_pred,label)
train_loss += loss.item()
loss.backward()
optimizer.step()
y_pred_class = torch.argmax(torch.softmax(y_pred,dim=1),dim=1)
total_epochs += len(label)
labels.append(label.tolist())
predictions.append(y_pred_class.tolist())
train_acc = get_vgg_accuracy(model, features_concat, len(train_data))
train_accs[epoch] = train_acc
val_acc = get_vgg_accuracy(model, valid_features_concat, len(val_data))
val_accs[epoch] = val_acc
print(f'Epoch #{epoch}: Training Accuracy - {train_accs[epoch]}%, Validation Accuracy - {val_accs[epoch]}%')
if epoch == num_epochs-1:
model_path = get_model_name(model.name, batch_size, lr, epoch)
torch.save(model.state_dict(), model_path)
print(f'Testing Accuracy: {get_accuracy(model, test_features_concat, len(test_data))}')
epochs = np.arange(1, num_epochs + 1)
np.savetxt("{}_train_err.csv".format(model_path), train_err)
np.savetxt("{}_train_loss.csv".format(model_path), train_losses)
np.savetxt("{}_train_acc.csv".format(model_path), train_accs)
np.savetxt("{}_val_err.csv".format(model_path), val_err)
np.savetxt("{}_val_loss.csv".format(model_path), val_loss)
np.savetxt("{}_val_acc.csv".format(model_path), val_accs)
return model_path, sum(labels, []), sum(predictions, [])
#Testing
def evaluate(model, loader, criterion, use_cuda=True):
'''
Inputs:
model: nn.Module network we want to train
loader: Data_loader to evaluate
criterion: loss criterion
Output:
err,loss for the data_loader provided
'''
total_loss = 0.0
total_err = 0.0
total_accuracy=0.0
total_epoch = 0
for i, data in enumerate(loader,0): # chooses the x and y values from loader (train, test or validation loader)
inputs, labels=data
if use_cuda and torch.cuda.is_available():
inputs = inputs.cuda()
labels = labels.cuda()
outputs = model(inputs)
loss = criterion(outputs, labels)
corr=(torch.argmax(outputs, axis=1)== labels)
total_accuracy += int(corr.sum())
total_loss += loss.item()
total_epoch += len(labels)
err = 1-(float(total_accuracy) / total_epoch)
loss = float(total_loss) / (i + 1)
return err, loss
# Save model output
def get_model_name(name, batch_size, learning_rate, epoch):
path = "model_{0}_bs{1}_lr{2}_epoch{3}".format(name,
batch_size,
learning_rate,
epoch)
return path
# Takes a homemade model and dataset, and calculates the model's accuracy
def get_accuracy(model, data, bs, use_cuda=True):
loader = torch.utils.data.DataLoader(data, batch_size=bs, shuffle=True)
correct = 0
total = 0
for imgs, labels in loader:
if use_cuda and torch.cuda.is_available():
imgs = imgs.to(device)
labels = labels.to(device)
output = model(imgs)
softmax = nn.Softmax(dim=1)
output = softmax(output)
correct += (torch.argmax(output, dim=1) == labels).float().sum()
total += imgs.shape[0]
return (100*correct / total).cpu()
# Returns a data loader with the specified batch size
def get_data_loader(data, batch_size):
data_loader = DataLoader(data, batch_size = batch_size, shuffle=True)
return data_loader
# Train a homemade model with different hyperparameters
def train_model(model, num_epochs, batch_size = 128, lr = 1e-2, optim_adam = True, use_cuda=True):
'''
Input:
model : nn.Module representing the net to train
train_loader: training data loader
val_loader: validation data loader
batch_size : batch size for training
num_epochs: Number of epochs to train the model over
lr : Learning rate
optim_adam: Boolean for choosing Adam_optimizer or SGD
Output:
function returns: train_err , train_loss , val_err, val_loss over number of epochs
Note: Training results are saved under path returned by function get_model_path
'''
train_loader = get_data_loader(train_data, batch_size)
if use_cuda and torch.cuda.is_available():
model.to(device)
criterion = nn.CrossEntropyLoss()
if optim_adam:
optimizer = optim.Adam(model.parameters() , lr = lr, weight_decay=0.1)
else:
optimizer = optim.SGD(model.parameters(), lr=lr, momentum=0.9, weight_decay=1e-5)
train_err = np.zeros(num_epochs)
train_losses = np.zeros(num_epochs)
train_accs = np.zeros(num_epochs)
val_err = np.zeros(num_epochs)
val_loss = np.zeros(num_epochs)
val_accs = np.zeros(num_epochs)
model.train()
for epoch in range(num_epochs):
train_loss = 0
train_acc = 0
val_acc = 0
total_epochs = 0
labels = []
predictions = []
for batch, (X,label) in enumerate(train_loader):
X, label = X.to(device), label.to(device)
optimizer.zero_grad()
y_pred = model(X)
loss = criterion(y_pred,label)
train_loss += loss.item()
loss.backward()
optimizer.step()
y_pred_class = torch.argmax(torch.softmax(y_pred,dim=1),dim=1)
total_epochs += len(label)
labels.append(label.tolist())
predictions.append(y_pred_class.tolist())
train_acc = get_accuracy(model, train_data, len(train_data))
train_accs[epoch] = train_acc
val_acc = get_accuracy(model, val_data, len(val_data))
val_accs[epoch] = val_acc
model_path = get_model_name(model.name, batch_size, lr, epoch)
torch.save(model.state_dict(), model_path)
print(f'Epoch #{epoch}: Training Accuracy - {train_accs[epoch]}%, Validation Accuracy - {val_accs[epoch]}%')
epochs = np.arange(1, num_epochs + 1)
np.savetxt("{}_train_err.csv".format(model_path), train_err)
np.savetxt("{}_train_loss.csv".format(model_path), train_losses)
np.savetxt("{}_train_acc.csv".format(model_path), train_accs)
np.savetxt("{}_val_err.csv".format(model_path), val_err)
np.savetxt("{}_val_loss.csv".format(model_path), val_loss)
np.savetxt("{}_val_acc.csv".format(model_path), val_accs)
return model_path, sum(labels, []), sum(predictions, [])
# Plotting the results
def plot_training_curve(path):
train_err = np.loadtxt("{}_train_err.csv".format(path))
val_err = np.loadtxt("{}_val_err.csv".format(path))
train_loss = np.loadtxt("{}_train_loss.csv".format(path))
val_loss = np.loadtxt("{}_val_loss.csv".format(path))
plt.title("Train vs Validation Error")
n = len(train_err) # number of epochs
plt.plot(range(1,n+1), train_err, label="Train")
plt.plot(range(1,n+1), val_err, label="Validation")
plt.xlabel("Epoch")
plt.ylabel("Error")
plt.legend(loc='best')
plt.show()
plt.title("Train vs Validation Loss")
plt.plot(range(1,n+1), train_loss, label="Train")
plt.plot(range(1,n+1), val_loss, label="Validation")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.legend(loc='best')
plt.show()
# Plots the training and validation accuracy curve
def plot_accuracy(path,lr,batch):
train_acc = np.loadtxt("{}_train_acc.csv".format(path))
val_acc = np.loadtxt("{}_val_acc.csv".format(path))
plt.title("Train & Val Accuracy, lr = {}, batch = {}".format(lr,batch))
n = len(train_acc) # number of epochs
plt.plot(range(1,n+1), train_acc, label="Train")
plt.plot(range(1,n+1), val_acc, label="Validation")
plt.xlabel("Epoch")
plt.ylabel("Accuracy")
plt.legend(loc='best')
plt.show()
# Plots the confusion matrix based on the list of labels and predictions
def plot_confusion_matrix(model, pred_label, true_label,lr,batch):
confusion_matrix = metrics.confusion_matrix(list(true_label), list(pred_label))
cm_display = metrics.ConfusionMatrixDisplay(confusion_matrix = confusion_matrix, display_labels = class_labels)
cm_display.plot()
plt.title("Confusion Matrix of {} Network , lr = {} , batch ={}".format(model.name,lr,batch))
plt.show()
def get_prediction_dist(y_pred_prob, class_labels):
# class_labels = {'glioma': 0, 'meningioma': 1, 'notumor': 2, 'pituitary': 3}
sns.set(font_scale=1.4) # Set font scale for readability
sns.kdeplot(data=y_pred_prob[:,0], label=class_labels[0], shade=True)
sns.kdeplot(data=y_pred_prob[:,1], label=class_labels[1], shade=True)
sns.kdeplot(data=y_pred_prob[:,2], label=class_labels[2], shade=True)
sns.kdeplot(data=y_pred_prob[:,3], label=class_labels[3], shade=True)
plt.xlabel('Predicted probability')
plt.ylabel('Density')
plt.title('Prediction Distribution')
plt.legend()
plt.show()
# Shows the model prediction and label of 8 images from the dataset
def show_testing(model, test_data):
show_loader = get_data_loader(test_data, 8)
img = next(iter(show_loader))
prediction = list(torch.argmax(torch.softmax(model(img[0]),dim=1),dim=1))
plt.figure(figsize=[15,20])
for i in range(8):
plt.subplot(4, 2, i+1)
plt.axis('off')
plt.imshow(img[0][i].permute(1,2,0))
title = f'Label: {test_data.classes[img[1][i]]}, Pred: {test_data.classes[prediction[i]]}'
plt.title(title)
# These are the different values used for the hyperparameter tuning
num_epochs = [25, 50]
batch_sizes = [128, 256]
learning_rates = [0.01, 0.005]
num_epochs = 25, batch_size = 128, learning_rate = 0.01:
This model was overfit and the training/validation accuracy was low.
num_epochs = 25, batch_size = 128, learning_rate = 0.005:
This model was overfit but the training/validation accuracy was higher compared to the previous model.
num_epochs = 25, batch_size = 256, learning_rate = 0.01:
This model was less overfit than previous models but the accuracy was slightly lower.
num_epochs = 25, batch_size = 256, learning_rate = 0.005:
This model had a high accuracy while not being too overfit. This was considered the best model.
num_epochs = 50, batch_size = 128, learning_rate = 0.01:
The accuracy curve constantly fluctuated and seemed like it never converged.
num_epochs = 50, batch_size = 128, learning_rate = 0.005:
This model was extremely overfit as the gap between the training and accuracy curve continued to increase. This could be due to the large number of epochs.
num_epochs = 50, batch_size = 256, learning_rate = 0.01:
This model was not too overfit, but the final accuracies was too low.
num_epochs = 50, batch_size = 256, learning_rate = 0.005:
This model also showed extreme overfitting as the validation accuracy plateaued while the training accuracy continued to increase.
# Tip: Should probably re-run this cell for every set of hyperparameters
# or create a new model for each set (uses more computing resources though)
vgg_tf_model = models.vgg16(weights='DEFAULT')
vgg_tf_model.name = 'VGG16'
for param in vgg_tf_model.parameters():
param.requires_grad = False
vgg_tf_features = vgg_tf_model.features
vgg_tf_features.name = 'VGG16_Features'
vgg_tf_classifier = vgg_tf_model.classifier
vgg_tf_classifier.name = 'VGG16_Classifier'
vgg_tf_class_num_imputs = vgg_tf_classifier[-1].in_features
vgg_tf_classifier[-1] = nn.Sequential(
nn.Linear(vgg_tf_class_num_imputs, 256), nn.ReLU(), nn.Dropout(0.5),
nn.Linear(256, num_classes))
ne = 25
bs = 128
lr = 0.01
vgg_path, vgg_labels, vgg_preds = train_vgg_model(vgg_tf_classifier, ne, bs, lr)
plot_accuracy(vgg_path, lr, bs)
plot_confusion_matrix(vgg_tf_model, vgg_preds, vgg_labels, lr, bs)
Epoch #0: Training Accuracy - 60.113216400146484%, Validation Accuracy - 56.222801208496094% Epoch #1: Training Accuracy - 63.683868408203125%, Validation Accuracy - 62.228023529052734% Epoch #2: Training Accuracy - 66.79730224609375%, Validation Accuracy - 65.10008239746094% Epoch #3: Training Accuracy - 67.84236907958984%, Validation Accuracy - 64.3167953491211% Epoch #4: Training Accuracy - 66.79730224609375%, Validation Accuracy - 63.533504486083984% Epoch #5: Training Accuracy - 67.40692901611328%, Validation Accuracy - 64.22976684570312% Epoch #6: Training Accuracy - 66.5360336303711%, Validation Accuracy - 66.40557098388672% Epoch #7: Training Accuracy - 68.25604248046875%, Validation Accuracy - 66.05744171142578% Epoch #8: Training Accuracy - 67.16742706298828%, Validation Accuracy - 65.53524780273438% Epoch #9: Training Accuracy - 67.95123291015625%, Validation Accuracy - 65.62227630615234% Epoch #10: Training Accuracy - 68.2125015258789%, Validation Accuracy - 65.70931243896484% Epoch #11: Training Accuracy - 70.67276763916016%, Validation Accuracy - 67.71105194091797% Epoch #12: Training Accuracy - 69.58415222167969%, Validation Accuracy - 68.23324584960938% Epoch #13: Training Accuracy - 70.17200469970703%, Validation Accuracy - 66.31853485107422% Epoch #14: Training Accuracy - 69.9978256225586%, Validation Accuracy - 66.40557098388672% Epoch #15: Training Accuracy - 69.51883697509766%, Validation Accuracy - 67.79808044433594% Epoch #16: Training Accuracy - 68.2125015258789%, Validation Accuracy - 67.1018295288086% Epoch #17: Training Accuracy - 68.97452545166016%, Validation Accuracy - 67.5369873046875% Epoch #18: Training Accuracy - 69.5623779296875%, Validation Accuracy - 65.2741470336914% Epoch #19: Training Accuracy - 69.6930160522461%, Validation Accuracy - 66.49259948730469% Epoch #20: Training Accuracy - 70.32440948486328%, Validation Accuracy - 67.36292266845703% Epoch #21: Training Accuracy - 69.58415222167969%, Validation Accuracy - 66.23150634765625% Epoch #22: Training Accuracy - 70.38972473144531%, Validation Accuracy - 66.75370025634766% Epoch #23: Training Accuracy - 70.45503997802734%, Validation Accuracy - 68.32027435302734% Testing Accuracy: 63.76811599731445 Epoch #24: Training Accuracy - 70.3026351928711%, Validation Accuracy - 67.9721450805664%
ne = 25
bs = 128
lr = 0.005
vgg_path, vgg_labels, vgg_preds = train_vgg_model(vgg_tf_classifier, ne, bs, lr)
plot_accuracy(vgg_path, lr, bs)
plot_confusion_matrix(vgg_tf_model, vgg_preds, vgg_labels, lr, bs)
Epoch #0: Training Accuracy - 70.56390380859375%, Validation Accuracy - 68.14620971679688% Epoch #1: Training Accuracy - 72.52340698242188%, Validation Accuracy - 71.10530853271484% Epoch #2: Training Accuracy - 75.59329986572266%, Validation Accuracy - 71.97563171386719% Epoch #3: Training Accuracy - 75.44088745117188%, Validation Accuracy - 70.93124389648438% Epoch #4: Training Accuracy - 75.61506652832031%, Validation Accuracy - 74.84769439697266% Epoch #5: Training Accuracy - 75.35379791259766%, Validation Accuracy - 71.54046630859375% Epoch #6: Training Accuracy - 75.24494171142578%, Validation Accuracy - 72.67189025878906% Epoch #7: Training Accuracy - 75.61506652832031%, Validation Accuracy - 71.71453094482422% Epoch #8: Training Accuracy - 74.94012451171875%, Validation Accuracy - 71.80156707763672% Epoch #9: Training Accuracy - 76.83431243896484%, Validation Accuracy - 73.02001190185547% Epoch #10: Training Accuracy - 75.59329986572266%, Validation Accuracy - 74.41252899169922% Epoch #11: Training Accuracy - 75.7457046508789%, Validation Accuracy - 73.54220581054688% Epoch #12: Training Accuracy - 77.18267059326172%, Validation Accuracy - 74.06439971923828% Epoch #13: Training Accuracy - 76.70368194580078%, Validation Accuracy - 74.93472290039062% Epoch #14: Training Accuracy - 76.4859619140625%, Validation Accuracy - 73.80330657958984% Epoch #15: Training Accuracy - 77.400390625%, Validation Accuracy - 75.1958236694336% Epoch #16: Training Accuracy - 76.94317626953125%, Validation Accuracy - 72.4978256225586% Epoch #17: Training Accuracy - 77.29153442382812%, Validation Accuracy - 72.75891876220703% Epoch #18: Training Accuracy - 78.03179168701172%, Validation Accuracy - 73.97737121582031% Epoch #19: Training Accuracy - 77.11735534667969%, Validation Accuracy - 71.88859558105469% Epoch #20: Training Accuracy - 78.53255462646484%, Validation Accuracy - 72.58485412597656% Epoch #21: Training Accuracy - 76.61659240722656%, Validation Accuracy - 75.02175903320312% Epoch #22: Training Accuracy - 77.77052307128906%, Validation Accuracy - 73.54220581054688% Epoch #23: Training Accuracy - 76.44241333007812%, Validation Accuracy - 73.4551773071289% Testing Accuracy: 69.94660949707031 Epoch #24: Training Accuracy - 77.11735534667969%, Validation Accuracy - 73.19407653808594%
ne = 25
bs = 256
lr = 0.01
vgg_path, vgg_labels, vgg_preds = train_vgg_model(vgg_tf_classifier, ne, bs, lr)
plot_accuracy(vgg_path, lr, bs)
plot_confusion_matrix(vgg_tf_model, vgg_preds, vgg_labels, lr, bs)
Epoch #0: Training Accuracy - 53.79926300048828%, Validation Accuracy - 52.91557693481445% Epoch #1: Training Accuracy - 64.66361999511719%, Validation Accuracy - 61.96692657470703% Epoch #2: Training Accuracy - 66.62312316894531%, Validation Accuracy - 65.62227630615234% Epoch #3: Training Accuracy - 68.88743591308594%, Validation Accuracy - 67.88511657714844% Epoch #4: Training Accuracy - 68.14717864990234%, Validation Accuracy - 67.449951171875% Epoch #5: Training Accuracy - 69.08338928222656%, Validation Accuracy - 66.05744171142578% Epoch #6: Training Accuracy - 71.32593536376953%, Validation Accuracy - 68.92950439453125% Epoch #7: Training Accuracy - 72.45809173583984%, Validation Accuracy - 70.6701431274414% Epoch #8: Training Accuracy - 70.62921905517578%, Validation Accuracy - 67.88511657714844% Epoch #9: Training Accuracy - 71.8702392578125%, Validation Accuracy - 71.62750244140625% Epoch #10: Training Accuracy - 72.9153060913086%, Validation Accuracy - 69.01653289794922% Epoch #11: Training Accuracy - 70.91226196289062%, Validation Accuracy - 67.71105194091797% Epoch #12: Training Accuracy - 72.34922790527344%, Validation Accuracy - 68.92950439453125% Epoch #13: Training Accuracy - 72.69758605957031%, Validation Accuracy - 69.53872680664062% Epoch #14: Training Accuracy - 73.67733764648438%, Validation Accuracy - 73.71627044677734% Epoch #15: Training Accuracy - 73.80796813964844%, Validation Accuracy - 69.7998275756836% Epoch #16: Training Accuracy - 74.0474624633789%, Validation Accuracy - 72.32376098632812% Epoch #17: Training Accuracy - 74.30873107910156%, Validation Accuracy - 70.32201385498047% Epoch #18: Training Accuracy - 74.43936920166016%, Validation Accuracy - 70.14794921875% Epoch #19: Training Accuracy - 73.89505767822266%, Validation Accuracy - 72.32376098632812% Epoch #20: Training Accuracy - 73.6337890625%, Validation Accuracy - 71.19233703613281% Epoch #21: Training Accuracy - 74.26519012451172%, Validation Accuracy - 71.54046630859375% Epoch #22: Training Accuracy - 74.15632629394531%, Validation Accuracy - 69.97388458251953% Epoch #23: Training Accuracy - 74.09101104736328%, Validation Accuracy - 71.71453094482422% Testing Accuracy: 69.0312728881836 Epoch #24: Training Accuracy - 75.70215606689453%, Validation Accuracy - 72.32376098632812%
ne = 25
bs = 256
lr = 0.005
vgg_path, vgg_labels, vgg_preds = train_vgg_model(vgg_tf_classifier, ne, bs, lr)
plot_accuracy(vgg_path, lr, bs)
plot_confusion_matrix(vgg_tf_model, vgg_preds, vgg_labels, lr, bs)
Epoch #0: Training Accuracy - 61.68082046508789%, Validation Accuracy - 59.00783157348633% Epoch #1: Training Accuracy - 70.4114990234375%, Validation Accuracy - 69.19059753417969% Epoch #2: Training Accuracy - 72.56694793701172%, Validation Accuracy - 70.32201385498047% Epoch #3: Training Accuracy - 72.24036407470703%, Validation Accuracy - 69.7127914428711% Epoch #4: Training Accuracy - 74.9183578491211%, Validation Accuracy - 72.845947265625% Epoch #5: Training Accuracy - 74.72240447998047%, Validation Accuracy - 71.19233703613281% Epoch #6: Training Accuracy - 75.91988372802734%, Validation Accuracy - 75.36988830566406% Epoch #7: Training Accuracy - 75.54975128173828%, Validation Accuracy - 73.80330657958984% Epoch #8: Training Accuracy - 76.87786102294922%, Validation Accuracy - 73.62924194335938% Epoch #9: Training Accuracy - 75.76747131347656%, Validation Accuracy - 74.58659362792969% Epoch #10: Training Accuracy - 76.24646759033203%, Validation Accuracy - 72.67189025878906% Epoch #11: Training Accuracy - 77.13912963867188%, Validation Accuracy - 72.9329833984375% Epoch #12: Training Accuracy - 77.42216491699219%, Validation Accuracy - 73.4551773071289% Epoch #13: Training Accuracy - 77.00849151611328%, Validation Accuracy - 73.4551773071289% Epoch #14: Training Accuracy - 77.29153442382812%, Validation Accuracy - 73.4551773071289% Epoch #15: Training Accuracy - 76.57305145263672%, Validation Accuracy - 73.10704803466797% Epoch #16: Training Accuracy - 76.33354949951172%, Validation Accuracy - 74.84769439697266% Epoch #17: Training Accuracy - 77.44393920898438%, Validation Accuracy - 73.71627044677734% Epoch #18: Training Accuracy - 77.79229736328125%, Validation Accuracy - 74.23846435546875% Epoch #19: Training Accuracy - 77.44393920898438%, Validation Accuracy - 74.32550048828125% Epoch #20: Training Accuracy - 78.72850036621094%, Validation Accuracy - 73.3681411743164% Epoch #21: Training Accuracy - 78.29306030273438%, Validation Accuracy - 73.19407653808594% Epoch #22: Training Accuracy - 78.90267944335938%, Validation Accuracy - 74.49956512451172% Epoch #23: Training Accuracy - 77.44393920898438%, Validation Accuracy - 72.4978256225586% Testing Accuracy: 70.8619384765625 Epoch #24: Training Accuracy - 77.98824310302734%, Validation Accuracy - 74.23846435546875%
ne = 50
bs = 128
lr = 0.01
vgg_path, vgg_labels, vgg_preds = train_vgg_model(vgg_tf_classifier, ne, bs, lr)
plot_accuracy(vgg_path, lr, bs)
plot_confusion_matrix(vgg_tf_model, vgg_preds, vgg_labels, lr, bs)
Epoch #0: Training Accuracy - 65.83932495117188%, Validation Accuracy - 65.01305389404297% Epoch #1: Training Accuracy - 71.08644104003906%, Validation Accuracy - 67.62401580810547% Epoch #2: Training Accuracy - 69.03984832763672%, Validation Accuracy - 68.14620971679688% Epoch #3: Training Accuracy - 73.024169921875%, Validation Accuracy - 72.32376098632812% Epoch #4: Training Accuracy - 73.41606903076172%, Validation Accuracy - 71.54046630859375% Epoch #5: Training Accuracy - 72.43631744384766%, Validation Accuracy - 70.6701431274414% Epoch #6: Training Accuracy - 73.35075378417969%, Validation Accuracy - 70.49607849121094% Epoch #7: Training Accuracy - 70.75985717773438%, Validation Accuracy - 67.9721450805664% Epoch #8: Training Accuracy - 72.2839126586914%, Validation Accuracy - 67.79808044433594% Epoch #9: Training Accuracy - 71.5654296875%, Validation Accuracy - 70.93124389648438% Epoch #10: Training Accuracy - 70.62921905517578%, Validation Accuracy - 66.66666412353516% Epoch #11: Training Accuracy - 70.75985717773438%, Validation Accuracy - 69.01653289794922% Epoch #12: Training Accuracy - 70.95580291748047%, Validation Accuracy - 66.92776489257812% Epoch #13: Training Accuracy - 72.69758605957031%, Validation Accuracy - 68.32027435302734% Epoch #14: Training Accuracy - 72.67581176757812%, Validation Accuracy - 69.45169830322266% Epoch #15: Training Accuracy - 72.43631744384766%, Validation Accuracy - 70.14794921875% Epoch #16: Training Accuracy - 69.27934265136719%, Validation Accuracy - 64.8389892578125% Epoch #17: Training Accuracy - 72.76290130615234%, Validation Accuracy - 69.10356903076172% Epoch #18: Training Accuracy - 69.84542083740234%, Validation Accuracy - 68.58137512207031% Epoch #19: Training Accuracy - 71.39125061035156%, Validation Accuracy - 68.23324584960938% Epoch #20: Training Accuracy - 71.28238677978516%, Validation Accuracy - 69.53872680664062% Epoch #21: Training Accuracy - 70.04136657714844%, Validation Accuracy - 67.9721450805664% Epoch #22: Training Accuracy - 73.26366424560547%, Validation Accuracy - 69.7127914428711% Epoch #23: Training Accuracy - 72.84999084472656%, Validation Accuracy - 70.32201385498047% Epoch #24: Training Accuracy - 73.08948516845703%, Validation Accuracy - 71.27937316894531% Epoch #25: Training Accuracy - 71.65251922607422%, Validation Accuracy - 67.88511657714844% Epoch #26: Training Accuracy - 72.04441833496094%, Validation Accuracy - 68.32027435302734% Epoch #27: Training Accuracy - 72.45809173583984%, Validation Accuracy - 69.97388458251953% Epoch #28: Training Accuracy - 70.84693908691406%, Validation Accuracy - 67.5369873046875% Epoch #29: Training Accuracy - 74.0692367553711%, Validation Accuracy - 69.10356903076172% Epoch #30: Training Accuracy - 72.98062133789062%, Validation Accuracy - 72.58485412597656% Epoch #31: Training Accuracy - 73.52493286132812%, Validation Accuracy - 71.01827239990234% Epoch #32: Training Accuracy - 71.60897064208984%, Validation Accuracy - 67.9721450805664% Epoch #33: Training Accuracy - 72.39277648925781%, Validation Accuracy - 70.58311462402344% Epoch #34: Training Accuracy - 74.09101104736328%, Validation Accuracy - 71.88859558105469% Epoch #35: Training Accuracy - 74.09101104736328%, Validation Accuracy - 71.54046630859375% Epoch #36: Training Accuracy - 73.00239562988281%, Validation Accuracy - 70.84420776367188% Epoch #37: Training Accuracy - 72.45809173583984%, Validation Accuracy - 69.27763366699219% Epoch #38: Training Accuracy - 72.00086975097656%, Validation Accuracy - 69.53872680664062% Epoch #39: Training Accuracy - 72.9153060913086%, Validation Accuracy - 69.88685607910156% Epoch #40: Training Accuracy - 73.17657470703125%, Validation Accuracy - 69.88685607910156% Epoch #41: Training Accuracy - 67.6681900024414%, Validation Accuracy - 66.92776489257812% Epoch #42: Training Accuracy - 71.36947631835938%, Validation Accuracy - 66.23150634765625% Epoch #43: Training Accuracy - 75.00544738769531%, Validation Accuracy - 70.49607849121094% Epoch #44: Training Accuracy - 73.41606903076172%, Validation Accuracy - 69.7998275756836% Epoch #45: Training Accuracy - 72.00086975097656%, Validation Accuracy - 69.01653289794922% Epoch #46: Training Accuracy - 71.93555450439453%, Validation Accuracy - 68.23324584960938% Epoch #47: Training Accuracy - 72.93708038330078%, Validation Accuracy - 70.58311462402344% Epoch #48: Training Accuracy - 72.54518127441406%, Validation Accuracy - 67.71105194091797% Testing Accuracy: 66.59039306640625 Epoch #49: Training Accuracy - 72.9153060913086%, Validation Accuracy - 70.58311462402344%
ne = 50
bs = 128
lr = 0.005
vgg_path, vgg_labels, vgg_preds = train_vgg_model(vgg_tf_classifier, ne, bs, lr)
plot_accuracy(vgg_path, lr, bs)
plot_confusion_matrix(vgg_tf_model, vgg_preds, vgg_labels, lr, bs)
Epoch #0: Training Accuracy - 70.93402862548828%, Validation Accuracy - 69.62576293945312% Epoch #1: Training Accuracy - 73.65556335449219%, Validation Accuracy - 72.4107894897461% Epoch #2: Training Accuracy - 74.41759490966797%, Validation Accuracy - 71.88859558105469% Epoch #3: Training Accuracy - 75.59329986572266%, Validation Accuracy - 73.71627044677734% Epoch #4: Training Accuracy - 75.46266174316406%, Validation Accuracy - 73.4551773071289% Epoch #5: Training Accuracy - 73.41606903076172%, Validation Accuracy - 72.32376098632812% Epoch #6: Training Accuracy - 76.24646759033203%, Validation Accuracy - 73.97737121582031% Epoch #7: Training Accuracy - 74.22164154052734%, Validation Accuracy - 72.32376098632812% Epoch #8: Training Accuracy - 74.8965835571289%, Validation Accuracy - 73.62924194335938% Epoch #9: Training Accuracy - 75.11430358886719%, Validation Accuracy - 71.62750244140625% Epoch #10: Training Accuracy - 75.54975128173828%, Validation Accuracy - 73.97737121582031% Epoch #11: Training Accuracy - 75.33203125%, Validation Accuracy - 73.02001190185547% Epoch #12: Training Accuracy - 76.0722885131836%, Validation Accuracy - 73.80330657958984% Epoch #13: Training Accuracy - 75.98519897460938%, Validation Accuracy - 72.845947265625% Epoch #14: Training Accuracy - 76.92140197753906%, Validation Accuracy - 74.49956512451172% Epoch #15: Training Accuracy - 76.0722885131836%, Validation Accuracy - 72.845947265625% Epoch #16: Training Accuracy - 76.29000854492188%, Validation Accuracy - 73.54220581054688% Epoch #17: Training Accuracy - 76.02873992919922%, Validation Accuracy - 72.845947265625% Epoch #18: Training Accuracy - 77.13912963867188%, Validation Accuracy - 71.45343780517578% Epoch #19: Training Accuracy - 77.24798583984375%, Validation Accuracy - 73.89033508300781% Epoch #20: Training Accuracy - 76.96495056152344%, Validation Accuracy - 72.23672485351562% Epoch #21: Training Accuracy - 76.74722290039062%, Validation Accuracy - 70.6701431274414% Epoch #22: Training Accuracy - 75.7457046508789%, Validation Accuracy - 71.27937316894531% Epoch #23: Training Accuracy - 76.87786102294922%, Validation Accuracy - 73.4551773071289% Epoch #24: Training Accuracy - 75.44088745117188%, Validation Accuracy - 72.67189025878906% Epoch #25: Training Accuracy - 77.13912963867188%, Validation Accuracy - 73.62924194335938% Epoch #26: Training Accuracy - 77.61811828613281%, Validation Accuracy - 74.67362976074219% Epoch #27: Training Accuracy - 76.44241333007812%, Validation Accuracy - 74.06439971923828% Epoch #28: Training Accuracy - 77.5092544555664%, Validation Accuracy - 73.80330657958984% Epoch #29: Training Accuracy - 76.09405517578125%, Validation Accuracy - 70.7571792602539% Epoch #30: Training Accuracy - 78.81558990478516%, Validation Accuracy - 74.41252899169922% Epoch #31: Training Accuracy - 78.01001739501953%, Validation Accuracy - 75.45691680908203% Epoch #32: Training Accuracy - 77.26976013183594%, Validation Accuracy - 73.19407653808594% Epoch #33: Training Accuracy - 78.9897689819336%, Validation Accuracy - 75.28285217285156% Epoch #34: Training Accuracy - 77.00849151611328%, Validation Accuracy - 71.80156707763672% Epoch #35: Training Accuracy - 77.66165924072266%, Validation Accuracy - 74.58659362792969% Epoch #36: Training Accuracy - 78.77204895019531%, Validation Accuracy - 75.28285217285156% Epoch #37: Training Accuracy - 78.42369079589844%, Validation Accuracy - 74.23846435546875% Epoch #38: Training Accuracy - 76.9867172241211%, Validation Accuracy - 73.28111267089844% Epoch #39: Training Accuracy - 79.14217376708984%, Validation Accuracy - 75.1958236694336% Epoch #40: Training Accuracy - 77.74874877929688%, Validation Accuracy - 74.84769439697266% Epoch #41: Training Accuracy - 78.29306030273438%, Validation Accuracy - 73.19407653808594% Epoch #42: Training Accuracy - 78.79381561279297%, Validation Accuracy - 74.76065826416016% Epoch #43: Training Accuracy - 78.70672607421875%, Validation Accuracy - 75.45691680908203% Epoch #44: Training Accuracy - 77.92292785644531%, Validation Accuracy - 74.84769439697266% Epoch #45: Training Accuracy - 77.77052307128906%, Validation Accuracy - 73.71627044677734% Epoch #46: Training Accuracy - 78.05355834960938%, Validation Accuracy - 73.62924194335938% Epoch #47: Training Accuracy - 79.49053192138672%, Validation Accuracy - 73.89033508300781% Epoch #48: Training Accuracy - 78.42369079589844%, Validation Accuracy - 74.23846435546875% Testing Accuracy: 70.25171661376953 Epoch #49: Training Accuracy - 78.27128601074219%, Validation Accuracy - 74.23846435546875%
ne = 50
bs = 256
lr = 0.01
vgg_path, vgg_labels, vgg_preds = train_vgg_model(vgg_tf_classifier, ne, bs, lr)
plot_accuracy(vgg_path, lr, bs)
plot_confusion_matrix(vgg_tf_model, vgg_preds, vgg_labels, lr, bs)
Epoch #0: Training Accuracy - 60.65752410888672%, Validation Accuracy - 60.13924789428711% Epoch #1: Training Accuracy - 66.36185455322266%, Validation Accuracy - 65.18711853027344% Epoch #2: Training Accuracy - 69.08338928222656%, Validation Accuracy - 64.3167953491211% Epoch #3: Training Accuracy - 70.54212951660156%, Validation Accuracy - 69.7127914428711% Epoch #4: Training Accuracy - 71.60897064208984%, Validation Accuracy - 70.2349853515625% Epoch #5: Training Accuracy - 71.84846496582031%, Validation Accuracy - 69.01653289794922% Epoch #6: Training Accuracy - 73.30720520019531%, Validation Accuracy - 71.71453094482422% Epoch #7: Training Accuracy - 72.54518127441406%, Validation Accuracy - 68.49433898925781% Epoch #8: Training Accuracy - 73.11125946044922%, Validation Accuracy - 71.10530853271484% Epoch #9: Training Accuracy - 73.35075378417969%, Validation Accuracy - 70.93124389648438% Epoch #10: Training Accuracy - 73.48138427734375%, Validation Accuracy - 72.14969635009766% Epoch #11: Training Accuracy - 71.67428588867188%, Validation Accuracy - 68.66840362548828% Epoch #12: Training Accuracy - 73.2201156616211%, Validation Accuracy - 71.19233703613281% Epoch #13: Training Accuracy - 74.63531494140625%, Validation Accuracy - 72.06266021728516% Epoch #14: Training Accuracy - 74.9183578491211%, Validation Accuracy - 68.92950439453125% Epoch #15: Training Accuracy - 74.35227966308594%, Validation Accuracy - 71.19233703613281% Epoch #16: Training Accuracy - 72.15328216552734%, Validation Accuracy - 67.88511657714844% Epoch #17: Training Accuracy - 72.82821655273438%, Validation Accuracy - 70.93124389648438% Epoch #18: Training Accuracy - 73.85151672363281%, Validation Accuracy - 71.88859558105469% Epoch #19: Training Accuracy - 72.43631744384766%, Validation Accuracy - 71.27937316894531% Epoch #20: Training Accuracy - 73.41606903076172%, Validation Accuracy - 69.36466217041016% Epoch #21: Training Accuracy - 74.33050537109375%, Validation Accuracy - 71.36640167236328% Epoch #22: Training Accuracy - 73.19834899902344%, Validation Accuracy - 71.01827239990234% Epoch #23: Training Accuracy - 73.72087860107422%, Validation Accuracy - 71.19233703613281% Epoch #24: Training Accuracy - 72.8935317993164%, Validation Accuracy - 71.45343780517578% Epoch #25: Training Accuracy - 72.47986602783203%, Validation Accuracy - 70.14794921875% Epoch #26: Training Accuracy - 73.78620147705078%, Validation Accuracy - 68.84246826171875% Epoch #27: Training Accuracy - 73.52493286132812%, Validation Accuracy - 71.01827239990234% Epoch #28: Training Accuracy - 74.5917739868164%, Validation Accuracy - 70.06092071533203% Epoch #29: Training Accuracy - 74.00392150878906%, Validation Accuracy - 70.32201385498047% Epoch #30: Training Accuracy - 75.17962646484375%, Validation Accuracy - 70.40904998779297% Epoch #31: Training Accuracy - 74.3740463256836%, Validation Accuracy - 71.88859558105469% Epoch #32: Training Accuracy - 74.85304260253906%, Validation Accuracy - 68.75543975830078% Epoch #33: Training Accuracy - 73.59024810791016%, Validation Accuracy - 68.92950439453125% Epoch #34: Training Accuracy - 75.33203125%, Validation Accuracy - 73.28111267089844% Epoch #35: Training Accuracy - 76.63836669921875%, Validation Accuracy - 70.84420776367188% Epoch #36: Training Accuracy - 75.65861511230469%, Validation Accuracy - 70.2349853515625% Epoch #37: Training Accuracy - 74.80949401855469%, Validation Accuracy - 72.845947265625% Epoch #38: Training Accuracy - 74.87480926513672%, Validation Accuracy - 71.62750244140625% Epoch #39: Training Accuracy - 75.07076263427734%, Validation Accuracy - 71.62750244140625% Epoch #40: Training Accuracy - 75.13607788085938%, Validation Accuracy - 71.19233703613281% Epoch #41: Training Accuracy - 74.63531494140625%, Validation Accuracy - 71.71453094482422% Epoch #42: Training Accuracy - 74.76595306396484%, Validation Accuracy - 69.97388458251953% Epoch #43: Training Accuracy - 76.55127716064453%, Validation Accuracy - 73.54220581054688% Epoch #44: Training Accuracy - 75.941650390625%, Validation Accuracy - 72.4978256225586% Epoch #45: Training Accuracy - 75.33203125%, Validation Accuracy - 70.6701431274414% Epoch #46: Training Accuracy - 74.98367309570312%, Validation Accuracy - 72.9329833984375% Epoch #47: Training Accuracy - 75.41912078857422%, Validation Accuracy - 72.4978256225586% Epoch #48: Training Accuracy - 74.52645874023438%, Validation Accuracy - 72.67189025878906% Testing Accuracy: 66.66667175292969 Epoch #49: Training Accuracy - 74.67886352539062%, Validation Accuracy - 70.14794921875%
ne = 50
bs = 256
lr = 0.005
vgg_path, vgg_labels, vgg_preds = train_vgg_model(vgg_tf_classifier, ne, bs, lr)
plot_accuracy(vgg_path, lr, bs)
plot_confusion_matrix(vgg_tf_model, vgg_preds, vgg_labels, lr, bs)
Epoch #0: Training Accuracy - 70.01959991455078%, Validation Accuracy - 68.14620971679688% Epoch #1: Training Accuracy - 74.00392150878906%, Validation Accuracy - 71.45343780517578% Epoch #2: Training Accuracy - 74.76595306396484%, Validation Accuracy - 71.97563171386719% Epoch #3: Training Accuracy - 75.61506652832031%, Validation Accuracy - 73.71627044677734% Epoch #4: Training Accuracy - 75.941650390625%, Validation Accuracy - 73.4551773071289% Epoch #5: Training Accuracy - 77.2262191772461%, Validation Accuracy - 72.845947265625% Epoch #6: Training Accuracy - 77.13912963867188%, Validation Accuracy - 75.1958236694336% Epoch #7: Training Accuracy - 77.44393920898438%, Validation Accuracy - 75.1958236694336% Epoch #8: Training Accuracy - 76.92140197753906%, Validation Accuracy - 75.6309814453125% Epoch #9: Training Accuracy - 77.57456970214844%, Validation Accuracy - 75.36988830566406% Epoch #10: Training Accuracy - 78.44546508789062%, Validation Accuracy - 75.5439453125% Epoch #11: Training Accuracy - 78.1406478881836%, Validation Accuracy - 73.71627044677734% Epoch #12: Training Accuracy - 78.27128601074219%, Validation Accuracy - 73.10704803466797% Epoch #13: Training Accuracy - 77.31330108642578%, Validation Accuracy - 72.14969635009766% Epoch #14: Training Accuracy - 78.1406478881836%, Validation Accuracy - 75.28285217285156% Epoch #15: Training Accuracy - 77.8140640258789%, Validation Accuracy - 74.23846435546875% Epoch #16: Training Accuracy - 78.59786987304688%, Validation Accuracy - 75.5439453125% Epoch #17: Training Accuracy - 78.94622802734375%, Validation Accuracy - 77.02349853515625% Epoch #18: Training Accuracy - 78.90267944335938%, Validation Accuracy - 74.84769439697266% Epoch #19: Training Accuracy - 79.94774627685547%, Validation Accuracy - 75.45691680908203% Epoch #20: Training Accuracy - 78.03179168701172%, Validation Accuracy - 74.15143585205078% Epoch #21: Training Accuracy - 78.40191650390625%, Validation Accuracy - 76.0661392211914% Epoch #22: Training Accuracy - 79.20748901367188%, Validation Accuracy - 76.41426849365234% Epoch #23: Training Accuracy - 78.33660125732422%, Validation Accuracy - 75.36988830566406% Epoch #24: Training Accuracy - 78.5543212890625%, Validation Accuracy - 73.4551773071289% Epoch #25: Training Accuracy - 78.3583755493164%, Validation Accuracy - 75.89207458496094% Epoch #26: Training Accuracy - 79.25103759765625%, Validation Accuracy - 75.45691680908203% Epoch #27: Training Accuracy - 80.16547393798828%, Validation Accuracy - 75.71800994873047% Epoch #28: Training Accuracy - 79.16394805908203%, Validation Accuracy - 74.15143585205078% Epoch #29: Training Accuracy - 79.31635284423828%, Validation Accuracy - 74.67362976074219% Epoch #30: Training Accuracy - 79.59939575195312%, Validation Accuracy - 75.1958236694336% Epoch #31: Training Accuracy - 79.99129486083984%, Validation Accuracy - 74.41252899169922% Epoch #32: Training Accuracy - 80.38319396972656%, Validation Accuracy - 75.6309814453125% Epoch #33: Training Accuracy - 80.1436996459961%, Validation Accuracy - 74.93472290039062% Epoch #34: Training Accuracy - 79.01154327392578%, Validation Accuracy - 73.97737121582031% Epoch #35: Training Accuracy - 79.92597961425781%, Validation Accuracy - 75.36988830566406% Epoch #36: Training Accuracy - 79.05508422851562%, Validation Accuracy - 75.45691680908203% Epoch #37: Training Accuracy - 80.20901489257812%, Validation Accuracy - 75.97911071777344% Epoch #38: Training Accuracy - 79.64293670654297%, Validation Accuracy - 75.80504608154297% Epoch #39: Training Accuracy - 80.03483581542969%, Validation Accuracy - 76.0661392211914% Epoch #40: Training Accuracy - 80.1219253540039%, Validation Accuracy - 75.5439453125% Epoch #41: Training Accuracy - 80.18724060058594%, Validation Accuracy - 76.24020385742188% Epoch #42: Training Accuracy - 80.2525634765625%, Validation Accuracy - 74.84769439697266% Epoch #43: Training Accuracy - 79.73002624511719%, Validation Accuracy - 75.71800994873047% Epoch #44: Training Accuracy - 79.55584716796875%, Validation Accuracy - 76.67536926269531% Epoch #45: Training Accuracy - 79.8171157836914%, Validation Accuracy - 75.02175903320312% Epoch #46: Training Accuracy - 81.49357604980469%, Validation Accuracy - 75.28285217285156% Epoch #47: Training Accuracy - 79.57762145996094%, Validation Accuracy - 76.1531753540039% Epoch #48: Training Accuracy - 79.79534149169922%, Validation Accuracy - 76.58833312988281% Testing Accuracy: 70.78565979003906 Epoch #49: Training Accuracy - 80.10015106201172%, Validation Accuracy - 75.97911071777344%
Following the same criteria used for the VGG-16 transfer learning model, we observed that the best homemade model had the following hyperparameters:
num_epochs = 25, batch_size = 128, learning_rate = 0.01
We also observed that the training for the homemade models took much longer than the VGG-16 model. Even though there were much fewer trainable parameters in the homemade CNN, we had to calculate all the convolutions in the feature extractor, which are more computationally expensive. In comparison, the VGG-16 model trainable parameters were for the final linear layer of the classifier.
# Run this cell to run through all 8 sets of hyperparameters
# Note: this takes a very long time (~3-4 hours), so you may be timed out
# Tip: Run each individual set of parameters in it's own cell similar to how it
# was done for the VGG-16 model
for lr in learning_rates:
for ne in num_epochs:
for bs in batch_sizes:
hm_model = HomemadeCNN()
hm_path, hm_labels, hm_preds = train_model(hm_model, ne, bs, lr)
plot_accuracy(hm_path)
plot_confusion_matrix(hm_model, hm_preds, hm_labels)
# Each set of hyperparameters was run separately on other Colab notebooks and
# this one, which produced the best results, was re-run here.
ne = 25
bs = 128
lr = 0.01
hm_model = HomemadeCNN()
hm_path, hm_labels, hm_preds = train_model(hm_model, ne, bs, lr)
plot_accuracy(hm_path)
plot_confusion_matrix(hm_model, hm_preds, hm_labels)
Epoch #0: Training Accuracy - 44.546051025390625%, Validation Accuracy - 45.256744384765625% Epoch #1: Training Accuracy - 56.73851776123047%, Validation Accuracy - 57.87641143798828% Epoch #2: Training Accuracy - 59.17700958251953%, Validation Accuracy - 58.65970230102539% Epoch #3: Training Accuracy - 62.35575866699219%, Validation Accuracy - 61.8798942565918% Epoch #4: Training Accuracy - 63.03070068359375%, Validation Accuracy - 64.14273071289062% Epoch #5: Training Accuracy - 62.8782958984375%, Validation Accuracy - 63.62053680419922% Epoch #6: Training Accuracy - 64.62007904052734%, Validation Accuracy - 63.70756912231445% Epoch #7: Training Accuracy - 65.12083435058594%, Validation Accuracy - 65.44821166992188% Epoch #8: Training Accuracy - 64.55475616455078%, Validation Accuracy - 66.14447021484375% Epoch #9: Training Accuracy - 66.90616607666016%, Validation Accuracy - 66.14447021484375% Epoch #10: Training Accuracy - 64.55475616455078%, Validation Accuracy - 65.18711853027344% Epoch #11: Training Accuracy - 65.14260864257812%, Validation Accuracy - 65.62227630615234% Epoch #12: Training Accuracy - 66.27476501464844%, Validation Accuracy - 65.44821166992188% Epoch #13: Training Accuracy - 67.82059478759766%, Validation Accuracy - 69.01653289794922% Epoch #14: Training Accuracy - 66.66667175292969%, Validation Accuracy - 67.18885803222656% Epoch #15: Training Accuracy - 66.88439178466797%, Validation Accuracy - 66.49259948730469% Epoch #16: Training Accuracy - 68.01654815673828%, Validation Accuracy - 68.23324584960938% Epoch #17: Training Accuracy - 68.38668060302734%, Validation Accuracy - 69.27763366699219% Epoch #18: Training Accuracy - 67.86414337158203%, Validation Accuracy - 66.40557098388672% Epoch #19: Training Accuracy - 67.6899642944336%, Validation Accuracy - 69.36466217041016% Epoch #20: Training Accuracy - 65.490966796875%, Validation Accuracy - 67.9721450805664% Epoch #21: Training Accuracy - 68.49553680419922%, Validation Accuracy - 68.0591812133789% Epoch #22: Training Accuracy - 66.94970703125%, Validation Accuracy - 67.449951171875% Epoch #23: Training Accuracy - 66.14413452148438%, Validation Accuracy - 67.5369873046875% Epoch #24: Training Accuracy - 68.86566925048828%, Validation Accuracy - 67.88511657714844%
--------------------------------------------------------------------------- KeyboardInterrupt Traceback (most recent call last) <ipython-input-181-c8d125963f4c> in <cell line: 1>() 3 for bs in batch_sizes: 4 hm_model = HomemadeCNN() ----> 5 hm_path, hm_labels, hm_preds = train_model(hm_model, ne, bs, lr) 6 7 plot_accuracy(hm_path) <ipython-input-180-34a7853bd4c2> in train_model(model, num_epochs, batch_size, lr, optim_adam, use_cuda) 145 total_epochs += len(label) 146 --> 147 labels.append(label.tolist()) 148 predictions.append(y_pred_class.tolist()) 149 KeyboardInterrupt:
The best performing VGG-16 transfer learning model had the following parameters:
learning_rate = 0.005, batch_size = 256, num_epochs = 25.
The training accuracy ended around 77.5% and the validation accuracy ended around 72.5%. This model showed the highest validation accuracy as well as the least amount of overfitting. The training for this VGG-16 model was much quicker than the homemade CNN model due to the architecture of the trainable layers.
The model seemed to have mostly confuse glioma and meningioma with each other as well as confusing meningioma tumours as pituitary. The training/validation accuracy curve as well as the confusion matrix is shown below.
The best performing homemade CNN model had the following parameters:
learning_rate = 0.01, batch_size = 128, num_epochs = 25.
The training accuracy ended around 68% and the validation accuracy ended around 67%. This model showed no signs of overfitting or underfitting, but had a lower overall accuracy compared to the VGG-16 model. Once again, the model mostly confused glioma and meningioma with each other as well as predicted many images as pituitary. The training/validation accuracy curves as well as the confusion matrix is below.
best_vgg_model = models.vgg16(weights='DEFAULT').to(device)
best_vgg_model.name = 'Best_VGG_Model'
best_vgg_model_classifier = best_vgg_model.classifier
best_vgg_model_classifier.name = 'Best_VGG_Model_Classifier'
best_vgg_model_class_inputs = best_vgg_model_classifier[-1].in_features
best_vgg_model_classifier[-1] = nn.Sequential(
nn.Linear(best_vgg_model_class_inputs, 256), nn.ReLU(), nn.Dropout(0.5),
nn.Linear(256, num_classes))
best_vgg_model_classifier.load_state_dict(torch.load('/content/model_VGG16_Classifier_bs256_lr0.005_epoch24', map_location=torch.device('cpu')))
<All keys matched successfully>
best_vgg_model.to(device)
print(f'Best model accuracy on new data: {get_accuracy(best_vgg_model, test_dem, len(test_dem), use_cuda=True).item(): .2f}%')
Best model accuracy on new data: 47.72%
The best VGG-16 model was tested against the test dataset of the original dataset as well as a new dataset. The model had an accuracy of 70.90% on the original test datset and an accuracy of 47.72% on the new dataset. The figures below shows 2 sets of 8 random images from the new dataset with the label and prediction.
show_testing(best_vgg_model.to('cpu'), test_dem)
show_testing(best_vgg_model.to('cpu'), test_dem)
best_hm_model = HomemadeCNN()
best_hm_model.load_state_dict(torch.load('/content/best_hm_model', map_location=torch.device('cpu')))
<All keys matched successfully>
get_accuracy(best_model, test_data, len(test_data), use_cuda=False)
tensor(68.5736)
get_accuracy(best_model, test_dem, len(test_dem), use_cuda=False)
tensor(36.5482)
The best homemade CNN model was tested against the test dataset of the original dataset as well as a new dataset. The model had an accuracy of 68.57% on the original test datset and an accuracy of 36.55% on the new dataset. The figure below shows 8 random images from the new dataset with the label and prediction.
show_testing(best_model, test_dem)
The accuracy for both models between the testing data from the original dataset as well as the new data dropped ~32%, which is a significant amount. We started looking into reasons as to why our model performed much poorer and noticed a couple issues.
Firstly, we noticed that some of the images from the new dataset had a vertical flip applied to them. This was contrary to some of the papers we read as there has been previous research conducted that showed that brain MRI imaging should only be horizontally flipped during data augmentation to avoid bias during training. The image below shows an example of a vertical flip, where the cerebellum and spinal cord are clearly shown at the top of the image.
We also noticed that the images from the new dataset tended to be more zoomed into the MRI scan itself compared to our original dataset. This meant that the new data typically had less black background near the edges. We potentially would get better accuracy on the new data if added a couple more transforms to the images, such as zooming out and adding black background padding to maintain the same size. The top image is an example from the original dataset and the bottom image is an example from the new dataset.
Another interesting finding we discovered was that many of our VGG-16 transfer learning models were overfit. This was most likely due to the fact that there was only one trainable layer in the classifier, making it easy to overtrain on.
In the future, there are a couple methods to remedy this. Firstly, we could try different architectures for the classifier. Furthermore, we could add other hyperparameters to tune such as the dropout rate and introduce methods such as regularization during our training process.
There were several challenges that we faced while working on this project. Firstly, data augmentation was not straightforward as MRI scans are a standardized procedure. Therefore, MRI images do not have much variation between how the brain is positioned within the image. This led to certain transforms to not be viable such as not being able to vertically flip the image (i.e., flip the image top to bottom). We thought that this tailored data augmentation would better represent MRI scans in a real life scenario. Secondly, a better knowledge of the tumour classes would have helped us explain and increase our model performance. For instance, the confusion matrices showed that many models were confusing the classification between glioma and meningioma tumours. In order to improve our model performance in the future, we could research more into what differentiates the two classes, features-wise, to help extract them for a better classification accuracy.
One of the biggest lessons we learned during the creation of this project was the importance in collaborating in parallel across different Colab notebooks to help increase our efficiency. This helped with avoiding runtime expiries, accelerated model testing, and helped accelerate our hyperparameter tuning procedure.
Nonetheless, it was still hard for our group to consistently reproduce and re-run our results due to our limited GPU capacity. We only had access to the base Google Colab while it seemed like other groups were able to run their models on other servers for days.
The biggest recommendation we have going forward is to implement a fine tuning method for the VGG-16 net. We realized that the last layers of the feature extractor are the most important for a high accuracy as they create the latent space. Therefore, if we were able to train them, we would be able to create a more customized feature space for our model, which would produce a better model. However, due to our limited GPU capabilities, we were unable to try fine tuning ourselves as it required too much computing resources to train the model.
Işın, A., Direkoğlu, C., & Şah, M. (2016). Review of MRI-based brain tumor image segmentation using deep learning methods. Procedia Computer Science, 102, 317-324.
Nalepa, J., Marcinkiewicz, M., & Kawulok, M. (2019). Data augmentation for brain-tumor segmentation: a review. Frontiers in computational neuroscience, 13, 83. Link: https://www.frontiersin.org/articles/10.3389/fncom.2019.00083/full
%%shell
jupyter nbconvert --to html MIE_1517_Group_11_Final_Project.ipynb
[NbConvertApp] Converting notebook MIE_1517_Group_11_Final_Project.ipynb to html [NbConvertApp] Writing 879303 bytes to MIE_1517_Group_11_Final_Project.html